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Abstract 

This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) 
of a query object g in a database of n objects. In contrast with most existing approaches, we can only 
access the "hidden" space in which the objects live through a similarity oracle. The oracle, given two 
reference objects and a query object, returns the reference object closest to the query object. The oracle 
attempts to model the behavior of human users, capable of making statements about similarity, but not of 
assigning meaningful numerical values to distances between objects. Using such an oracle, the best we 
can hope for is to obtain, for every object u in the database, a sorted list of the other objects according 
to their distance to u. We call the position of object v in this list the rank of v with respect to u. The 
difficulty of searching using such an oracle depends on the non-homogeneities of the underlying space. 
We use two different characterizations of the underlying space to capture this property. The first one, 
rank distortion, relates pairwise ranks to the average difference in ranks w.r.t. other objects (a more 
precise definition is given in Section The second one, the combinatorial framework (a notion from 
III J 121), defines approximate triangle inequalities on ranks (a more precise definition is given in Section 
HH) . Roughly speaking, it defines a multiplicative factor D by which the triangle inequality on ranks 
can be violated. Utilizing the insights from these ideas, we develop a hierarchical search algorithm 
that builds a data structure, which allows us to retrieve the nearest neighbor with high probability in 
0(1?'^ log^ nloglogn^ ) questions. The learning requires asking 0(n_D'^ log^ n log log ) questions 
in total and we need to store 0(n log^ n/ log(2D)) bits in total. We also provide an approximate nearest 
neighbor search algorithm. Finally, we show a lower bound of fl{D log + D^) average number of 
questions in the search phase for randomized algorithms when the answers to all possible questions in 
the learning phase are given. We also introduce rank-sensitive hash functions which gives same hash 
value for "similar" objects based on the rank-value of the objects obtained from the similarity oracle. As 
one application of RSH, we demonstrate that, we can retrieve one of the (1 + e)r-nearest neighbor of a 
query point in evaluations of the hash function, where 6 only depends on e and the rank distortion. 
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I. Introduction 

Consider the situation where we want to search and navigate a database, but we do not know the 
underlying relationships between the objects. In particular, distances may be difficult to discern, or may 
not be well-defined. Such situations are common with objects where human perception may be involved. 
A collection of pictures of faces, taken from different angles and distances is an illustration of such 
a dataset. Indeed, the distances between feature vectors might be far from the similarity perceived by 
humans. Notwithstanding, either with human-assistance or approximate classification, we may be able to 
determine the relative proximity of an object with respect to a small number of other object^. Humans 
have the ability to compare objects and make statements about which are the most similar ones, though 
they can probably not assign a meaningful numerical value to similarity. This led to the question of how 
to design search algorithms based on binary similarity decisions of the type "A looks more like B than 
C". 

More formally, we aim to design an algorithm that given a query object (e.g., a face), efficiently 
returns an object that is similar to that object among the objects in a database. To do so, we have access 
to a similarity oracle which, given two reference objects and a query object, can tell which of the two 
reference objects is most similar to the query object. We measure the performance of all our algorithms 
in terms of the number of questions that we need to ask the oracle. We can pre-process the database 
during a learning phase, and use the resulting answers to facilitate the search process. 

We do not make the assumption that the "hidden" space in which the database objects live needs 
to be a metric space. Using this oracle one can retrieve for every object u in the database, a sorted 
list of the other objects according to their distance to u. We call the position of object v in this list 
the rank of v with respect to u, and denote it by r„(u). Clearly, this relationship can be asymmetric 
i.e., ru{v) 7^ ry{u) in general. This setup raises several new questions and issues, as any space can 
be described by its ranks relationships. How much does the fact that the rank of some object v w.r.t. 
some other object u is k, and the rank of w w.r.t. u is k' tell us about the rank of w w.r.t. vl In this 
paper, we introduce the notion of rank distortion (see Section |ll] for a rigorous definition). The rank 
distortion captures how closely ry[w) is related to the average ^ Ylu Vu{v) — r„(w)|. The framework 
introduced in |l2j, defines approximate triangle inequalities on the ranks, another way to capture these 
relationships. Those inequalities roughly tell us how "transitive" the similarity relationship is and give us 
a notion of combinatorial disorder. If we have this information, we can use partial rank information to 
estimate, or infer the other ranks. In this paper, we will first investigate the case where we can use such a 
characterization of the hidden space as an input to our algorithms. We develop a randomized hierarchical 
scheme that improves the existing bounds for nearest neighbor search based on a similarity oracle (see 
Section IPAI ). We also prove, as far as we know, the first lower bound on the average number of questions 
to be asked for randomized nearest-neighbor search in this setup (see Section ITVl ). Then, in Section |Vl 
we ask what can be done if no characterization of the hidden space is known and therefore cannot be 
used as an input to the algorithms. In that case, we cannot estimate, or limit, ranks anymore if we have 
partial rank information. Nevertheless, we develop algorithms that can decompose the space such that 
dissimilar objects are likely to get separated, and similar objects have the tendency to stay together. This 
generalizes the notion of randomized fc-d-trees (||4|) to our setup. Building on this intuition, we introduce 
the notion of rank-sensitive hashing (RSH) in Section IV-CI Similarly to locality-sensitive hashing, we 
can retrieve one of the R nearest neighbors of a query point very efficiently. The hash function itself 
does not require any characterization of the subjacent space as an input. However, the smallest value of 
R we can choose depends on the rank distortion. In general, both the criteria (combinatorial disorder and 
rank distortion) we use to characterize the hidden space seem to capture how "homogeneous" that space 
is. It appears that the less homogeneous it is, the more difficult it becomes to search. In particular, if the 

'We have implemented such a human-assisted system for a database of faces in a project called "facebrowser" (3). 



3 



rank relationship is very asymmetric, and some objects are far from every other object, the information 
contained about those objects in the ranks matrix is very sparse and hard to capture. We apply this idea 
of RSH to NN search, but we believe that this might be useful in other scenarios as well. 

A. Relationship to published works 

The nearest neighbor (NN) problem, and many variations thereof, have been extensively studied in 
the literature (see for instance Q and IS for surveys). In particular, very efficient algorithms have been 
developed for specific classes of metric spaces, such as metric spaces with a low intrinsic dimension 
or a bounded growth factor. In 171, the authors introduce e-nets, a very simple data structure for 
nearest neighbor search (and many other applications). The complexity of those nets depends on the 
doubling dimension of the underlying space. In lU, the authors present a random sampling algorithm to 
produce a data structure for search in growth restricted metrics. The restricted growth guarantees that a 
random sample will have some nice properties. In particular, by randomly selecting a small number of 
representatives at different scales for every object in a learning phase, one can zoom in on the nearest 
neighbor of a query point during the search phase. On the other hand, search when the underlying space is 
not necessarily a metric space appears to have very little prior work. In some sense, it is a generalization 
of the above problem, as any dataset can be represented by its rank relationships. The problem of 
searching with a similarity oracle was first studied in ll2fl where a random walk algorithm is presented. 
The main limitation of this algorithm is the fact that all rank relationships need to be known in advance, 
which amounts to asking the oracle 0(n^ logn) questions, in a database of n objects. The authors of iHl 
and ||2l work with a combinatorial framework for nearest neighbor search, which defines approximates 
inequalities for ranks analogous to the triangle inequality for distances. Their bounds depend crucially 
on the combinatorial disorder, represented by the disorder constant D of the database (a notion to be 
defined more formally in Section JIJ which captures to what extent the triangle inequality on ranks can 
be violated). In HI, a data structure similar in spirit to e-nets of IjTj is introduced. It is shown that a 
learning phase with complexity 0{D'^ n\o^ n) questions and a space complexity of 0{D^n + Dnlogn) 
allows to retrieve the nearest neighbor in 0{D'^ logn) questions, in a database of n objects. The learning 
phase builds a hierarchical structure based on coverings of exponentially decreasing radiH We will show 
(see Section HID ) that we can improve those bounds by a factor polynomial in D, if we are willing to 
accept a negligible (smaller than i) probability of failure. Our algorithm is based on random sampling, 
and hence can be seen as a form of metric skip list (as introduced in IJH), but applied to a combinatorial 
(non-metric) framework. However, the fact that we do not have access to distances forces us to use new 
techniques in order to minimize the number of questions we need to ask (or ranks we need to compute). 
In particular, we sample the database at different densities, and infer the ranks from the density of the 
sampling, which we believe is a new technique. We also need to relate samples to each other when 
building the data structure top down. We also present what we believe is the first lower bound for our 
problem of searching through comparisons. 

A natural question to ask is whether one can develop data structures for NN when a characterization 
of the underlying space is unknown. This has been addressed in the case when the underlying metric 
space has low "intrinsic" dimension and one has access to metric distances in Q, H. In H, it is shown 
that one can build a binary tree decomposition of a dataset of points in 5?'^, such that the diameter of 
the sets in the tree is reduced by a constant after a number of level that only depends on the intrinsic 
dimension of the data, and not d. The term intrinsic dimension either refers to the Assouad or the 
covariance dimension. Therefore, one can similarly ask such a natural question in our framework where 
we do not have access to metric distances (or they do not exist). We develop a binary tree (hierarchical) 

^Our interest in this formulation arose from an applied viewpoint in the implementation of the facebrowser system [3]. 
^the radius of a ball is defined as the cardinality of that ball. 
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decomposition, when the characteristics of the underlying space (disorder constant) is unknown. This 
extends the result of H to our framework, where we only have access to the underlying space through 
comparisons. 

The approximate nearest neighbor problem consists in finding an element that is at distance at most 
(1 + e)dmin from the query point q, where dmin = minjd(i,g). In |j9l, Indyk and Motwani present 
two algorithms for this problem. In particular, locality sensitive hashing, through which they obtain an 
algorithm with polynomial learning and query time polynomial in d and log n. For binary vectors, it is 
remarkable that the performance of the algorithm does not depend on the dimension. A survey of results 
for LSH can be found in ifTOl . In ifTTI . Panigrahy shows that instead of using a large number of hash 
tables as it is the case in the approach above, only a few can be used. These are then hashed to several 
randomly chosen objects in the neighborhood of the query point, and it is shown shows that at least one 
of them will fall into the same bucket as the nearest neighbor. The authors of [12] prove a lower bound on 
the parameter p = l°gpjp for (r, cr,p, i-*)-locality sensitive hashing schemes. We present a new hashing 
scheme that is rank-sensitive (RSH). How efficient the scheme is depends on another property of the 
hidden space, its rank-distortion. The rank-distortion need not be an input to the algorithm, however, 
the performance will depend on it. We give sufficient conditions for RSH to work and demonstrate its 
application to NN search. 

To the best of our knowledge, the notion of rank-sensitive hashing and approximate (and randomized) 
nearest neighbor search using similarity oracle is studied for the first time in this paper. Moreover, the 
hierarchical search scheme proposed is more efficient than earlier schemes. The lower bound presented 
appears to be new and demonstrates that our schemes are (almost) efficient. 



In this section, we define formally the notions that we use in the rest of the paper. We consider a 
hidden space /C with distance function d{.,.), and a database of objects T C /C, with |T| = n. We 
do not have access to the distances between the objects in /C directly. We can only access this space 
through a similarity oracle which for any point g G /C, and objects u,v returns: 



For the sake of simplicity, we consider that all distances in /C are different. Note that the objects do not 
need to be in an underlying metric space for this similarity oracle. We now define the notion of rank. 

Definition 1. The rank of u in a set S with respect to v, r^{u,S) is equal to c, if u is the c*^ nearest 
object to V in S. 

To simplify the notation, we only indicate the set if it is unclear from the context i.e., we write r^(ti) 
instead of ri,{u,S) unless there is an ambiguity. Note that rank need not be a symmetric relationship 
between objects i.e., ru{v) / r„(it) in general. Further, note that we can rank m objects w.r.t. an object 
o by asking the oracle O(mlogm) questions. To do so, create the ranking w.r.t. o by adding one object 
at a time. Observe that in order to add the {i + 1)*'* object to the list, we need to ask log(i) questions. 
More precisely, we need to ask whether the {i + 1)*^' object is closer to o than the object currently at 
position i/2. Then, we can recurse on the set new set of objects (e.g., if the object to insert is closer 
than the i/l^'^ object, select the object as the new "pivot"). Summing over i, the total number of 
questions to be asked to sort m objects is 0(?nlog(?n)). 

Our first characterization of the space of objects is through a form of approximate triangle inequalities 
first introduced in IT] and |[2]F[ Instead of defining a relationship between distances, these triangle 

'*We have another characterization called rank distortion in Definition [3] 
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inequalities define a relationship between ranks. These relationships depend on a property of the space 
called the disorder constant D. In HI and lEl, four such inequalities are defined, all implying the others 
with D' = D'^. 

Definition 2. The rank disorder of a set of objects S is the smallest D such that Vx, y,z € S, we have 
the following approximate triangle inequalities: 

1) r,{y,S)<D{r,{x,S)+r,{y,S)) 

2) r,{y,S)<D{r^{z,S)+ry{z,S)) 

3) r^{y,S)<D{r,{z,S)+r,{y,S)) 

4) r,{y,S)<D{r,{x,S)+ry{z,S)) 

In particular, rx{x,S) = and rx{y,S) < Dry{x,S). 

We define a rank-ball of radius r around some point x as Bx{r) = {z G S\rx{i) < r}. A ball in 
distance is defined as (3u{r) = G S\d{u,i) < r} 

We further define the rank matrix TZ where rij = ri{j), and the matrix W = TZ + TI' (note that the 
matrix W is symmetric). For a subset 5 G /C, we define its diameter Ag = maxj jg^ Wj^ . Let p.j denote 
the i*^ column of TZ i.e., we associate to every object o G T a vector po G {0, n — 1}", such that the 
j^^ coordinate of a is given by rj{o). 

We now define the rank-distortion of a set S as follows: 

Definition 3. We say of a set of objects S that its rank distortion function is f : N+ — > N+, if f is 
monotonically increasing and if there exists 7 > (the rank-distortion j such that Vn, v G S: 

f{ru{v)) < \\pv - PuWi < lf{ru{v)) 

Lemma 1. If the function f is linear i.e., / = cru{v), then the four approximate triangle inequalities 
are implied with D < j. 

For example, for the first inequality, we have ra;(y,/C) < \\px—Py\\i/c < {\\px—Pz\\i+\\Pz—Py\\i)/c< 
7(rz(x,/C) -\- rz{y,IC)). The proof for the other inequalities is similar. 
We can define the nearest neighbor problem as follows: 

Definition 4 (i?-nearest neighbor problem). Given a set of objects T and a query point q, return one 
of the R objects in T closest to q. In particular, if R = 1, return the closest object to q in T. 

We say that a hashing scheme is (r, R,p, P)-sensitive if 

Definition 5. We call a hashing scheme h, "{r, R,p, P)-rank-sensitive" ifVqG /C,u G T, 

V [h{q) = h{u)\rq{u,r) <r]>pandV [h{q) = h{u)\rq{u,r) > R] < P 

Note that we should have P < p. 

Finally, we say that a result holds with iiigii probability (w.h.p.) if it hold with probability higher 
than 1 - ^. 

n 

III. Contributions 

One of the difficulty of searching a hidden space arises from the fact that we cannot know how 
transitive the rank relationship is i.e., we cannot know whether the fact that A is similar to B, and B 
is similar to C implies that A is similar to C. This is problematic in the sense that even if the oracle 
tells us that A is closer to our query point than B, it does not necessarily imply that points close to 
A are better candidates than points close to B. In metric spaces, such a guarantee is provided by the 
triangle inequality. A way to characterize the hidden space is to limit the extent to which the triangle 
inequality on ranks can be violated. The combinatorial framework, introduced in |[T1, ||2l, (see definition 
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of approximate triangle inequalities) does exactly that. In this paper, we improve on their results in two 
ways. We provide more efficient algorithms using randomization and also provide a new lower bound 
for such randomized algorithms. More precisely, we show that if we only require success with high 
probability for nearest neighbor search, we can exploit the fact that a sample of randomly chosen points 
will have nice properties. In particular, it will be very likely that every object in the database will have an 
object sampled that is similar to itself. By sampling more and more densely at every level of a hierarchy, 
we will ultimately sample all objects. The key observation is that in order to find the sample closest 
to a particular object, we will only need to look at objects for which the closest sample at the level 
above in the hierarchy was also close to that object. We introduce a conceptually simple randomized 
hierarchical scheme that allows us to reduce the learning compared to the existing algorithm (see ID, 
121) by a factor D^, memory consumption by a factor D^/log^n, and a factor D/ log n log log for 
search (see Section II-AI ). This algorithm's performance is best when the disorder constant is small. 

Theorem 1. There exists a data structure, which for a given query point q, can retrieve its nearest 
neighbor with high probability in 0(Z)^ log^ n loglogn^ ) questions. The learning requires asking 
0{nD^ n\og\ogn^ ) questions in total. We need to store 0(n log^ n/ log(2L))) bits in total. 

We then prove a lower bound on the average search time to retrieve the nearest neighbor of a query 
point for randomized algorithms. Our result confirms the intuition we have developed so far. Indeed, the 
higher the disorder constant, the more difficult it becomes to search. One way to interpret this result 
is that the higher the disorder constant D, the less information the answer to a question to the Oracle 
provides us. 

Theorem 2. There exists a space, a configuration of a database of n objects in that space and a 
distribution over placements of the query point q such that no randomized search algorithm, even if 
0{rfi) questions can be asked in the learning phase, can find q's nearest neighbor in the database for 
sure (with a probability of error of 0} by asking less than an expected 17(1) log(-^) + D"^) questions. 

Consequently, our schemes are asymptotically (for n) within a factor 0{D) of the optimal scheme 
{i.e., within Dpolylog{n) questions of the optimal search algorithm). The proofs of those two theorems 
are provided in Section Hvl 

Clearly, one of the limitations of the schemes above is that we need to know the disorder constant. 
It might be possible to estimate the value of the disorder constant based on a sample of objects in the 
database. Limitations of this approach are the fact that we might considerably degrade the performance 
of the algorithms if the estimator is inaccurate, and that we might run into trouble if the query point 
does not come from the same distribution as the database points T. We therefore extend, in Theorem 
|6l the idea of A;-d-trees to our setup. We provide an algorithm to build a binary tree that adapts to the 
disorder of the hidden space (see m for an analogous result for SR'^). In Section IV-CI we present a 
new rank-sensitive hash function with many potential applications. The idea of rank-sensitive hashing 
is that by computing many times a hash function drawn at random, similar objects will be assigned the 
same hash value more frequently that dissimilar objects. The performance of the rank-sensitive hashing 
scheme depends on the rank-distortion of the hidden space. Instead of capturing how "transitive" the 
rank relationship is, the rank disorder captures how the rank r„(z;) relates to the average rank i.e., 
E ) — rj(n)|]. In other words, if we picked an object x at random, and sorted all other objects w.r.t. 
this object, how would \rx{v) — rx{u)\ relate to ru{v)7 If ru{v) can be approximated by a function / 
of E [\rj{v) — rj{u)\]), then we can exploit this fact to separate points close to q and points far from q. 

Theorem 3. Given a set of objects S with rank-distortion function f, and rank distortion 7, there exists 
a function h which is (r, (1 + e)r, 1 — '^t^, 1 — '^^^l^^^'^^ ) -rank-sensitive. 

A special case is when the function / is constant. Then, the behavior of the function is similar to 
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the one observed with locaUty-sensitive hashing for binary vectors. One of the consequences is that we 
can retrieve one of the i? = (1 + e)r nearest neighbors of a query point q in n'^^?^ questions. By using 
the output of the hash function in a different way, we can compute an overall ranking of the objects. 
We can then retrieve "popular" objects i.e., those which are close to many other objects. This idea is 
discussed in Section IV-BI 

IV. Searching with Known Disorder Constant 

In this section, we make the assumption that the disorder constant D, of T U {q} is known, and that 
we can consequently use it as an input to our algorithms. Knowing D is an advantage, as it allows one to 
rapidly exclude some candidate objects during the search phase. In other words, we can take advantage 
of the fact that if we found an object close to the query point q, then objects which are far from that 
object cannot be the nearest neighbor of q. We first present an algorithm for nearest-neighbor search. 
The algorithms builds a hierarchical decomposition of the test set T. The construction succeeds with 
high probability i.e., for a fixed query point q, the data structure is such that it will return g's nearest 
neighbor w.h.p. Then, we present a lower bound on the search complexity. 

A. Hierarchical Data Structure For Nearest-Neighbor Search 

The learning phase is described in Algorithm [T] The algorithm builds a hierarchical decomposition 
level by level, top-down. At each level, we sample objects from the database. The set of samples at level 
i is denoted by Si, and we have 15^1 = rrii = a{2Dy logn, where a is a constant independent of n and 
D. At each level i, every object in T is put in the "bin" of the sample in Si closest to it. To find this 
sample at level i, for every object o we rank the samples in Si w.r.t. o (by using the oracle to make 
pairwise comparisons). However, we will show that given that we know D, we only need to rank those 
samples that fell in the bin of one of the at most AaDlogn nearest samples to o at level i — 1. This is 
a consequence of the fact that we carefully chose the density of the samples at each level. Further, the 
fact that we build the hierarchy top-down, allows us to use the answers to the questions asked at level i, 
to reduce the number of questions we need to ask at level i + 1. This way, the number of questions per 
object does not increase as we go down in the hierarchy, even though the number of samples increases. 
The search process is described in Algorithm |2] The key idea is that the sample closest to the query point 
on the lowest level will be its nearest neighbor. Hence, by repeating the same process as for inserting 
objects in the database, we can retrieve the nearest neighbor w.h.p. 

We will now show that Algorithm [T] succeeds with probability higher than 1 — - (w.h.p.) and that it 
requires asking less than 0(L''^log^riloglogn^ ) questions w.h.p. 

Theorem 4. Algorithm \1\ succeeds with probability higher than ^ — ^ (w.h.p.) and it requires asking 
less than 0{nD^\o^ n\og\ogn^ ) questions w.h.p. 

Proof: See Appendix lAl ■ 
The proof of Theorem [T] is then immediate and is given in Appendix |B] Note that this scheme can be 
easily modified for ii-nearest neighbor search. At the z*^ level of the hierarchy, the closest sample to q 
will, w.h.p., be one of its ^^^y nearest neighbors. If we are only interested in the level of precision, we 
can consequently stop the construction of the hierarchy at the desired level. 

B. Lower Bound 

In this section, we show that there exists configurations of n objects in a graph metric for which no 
search algorithm can be guaranteed to find the nearest neighbor of a query point in less than expected 
r2(Dlog-02 + D^) questions. We make the assumptions that all possible questions related to the n 
database objects can be asked during the learning phase, and even that the structure of the database 
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input : A database with n objects zi, Zn and disorder D 

output: For each object u, a vector of length log n/ log(2Z?). The list of all samples UiSi 



1.1 

1.2 
1.3 
1.4 
1.5 
1.6 

1.7 



1.8 
1.9 
1.10 
1.11 
1.12 
1.13 

1.14 
1.15 



for 



1 to L 



log n 
log 2D 



do 



Let Si be a set of a{2Dy log n objects chosen u.a.r. in the database T; 
for j ^ 1 to n do 
if i = 1 then 
I c,(l)^Si 
else 

Cj{i) ^ |f G Sj|position of — 1) in c^(i — 1) smaller than 4aZ)logn|; 

/* Cj{i) is the set of samples in Si, for which the closest 
sample in Si^i was one of the (at most) 4aZ)log(n) 
closest sample to Zj in Si-i */ 

end 

if |cj(i)| = then 
I Report Failure 



else 



end 



c'j{i) ^ sort Cj{i) according to r^^. (f , Si), \/v G Cj{i); 
4>j{i) <— first object in c'j{i); 

/* 4>jii) is the sample in 5^ closest t 



O Z-i 



end 



1.16 end 



Algorithm 1: Learning Algorithm 



input : A database with n objects and disorder D, the Ust of samples, the vectors (p, a query 
point q 

output: The nearest neighbor of q in the database 

2.1 c;(l) = Si- 



2.2 for i ^ 2 to L 



log 71 

log 2D 



do 



Cg(^) ^ {v £ Sjlposition of (/>^,(^ — 1) in Cg{i — 1) smaller than 4aDlogn}; 
Cg(i) ^ sort Cq{i) according to rq{v,Si), \/v G Cq{i); 



2.3 
2.4 

2.5 end 

2.6 return first object in c' iSSIL 



g V log 2D / 

Algorithm 2: Search Algorithm 



is known. Then, we attach a query point to the database constellation in a random way. Consider the 
graph shown in Fig. |2] It is a star with a branches, each composed of n/a^ supernodes. All edges in 
this part of the graph have weight 1. Inside each supernode, there are a database objects. A root node 
that connects the supernode to the other supernodes, and a objects, each connected to the root with a 
different edge. The weights of these edges range from l/4a to a /4a. Finally, the query point will be 
connected to one object on every branch of the star. Hence, the query point has a direct neighbors (one 
on each branch of the star). The edges connecting the query point to the graph have weights ranging 
from 1 to 1 + e, where e <C l/4a. Note that we cannot know which are the direct neighbors of the query 
point, nor what the weights of the corresponding edges are. Thus, given the n database objects and the 
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answers to all possible questions we can ask about the database, we need to find the nearest neighbor 
of the query point. First, we show that this structure has disorder 0(a) (proof in Appendix 0. 

Lemma 2. The graph shown in Fig. ^with the shortest path distance has disorder constant D = B(a). 

In the proof of Theorem |2] (see Appendix O for a full proof), we show that we can lower bound the 
expected running time of any randomized algorithm on the example of Figure ID The idea of the proof 
is that we must identify and compare all direct neighbors to the query point, and then find the nearest 
neighbor among the direct neighbors. We show that we cannot identify all direct neighbors in fewer than 
an expected 0(1) log + D"^) questions. 

V. Searching with Unknown Characterization 

When the disorder constant is unknown, we cannot be sure to retrieve the nearest neighbor of a query 
point q, unless we ask 0{n) question^ In Section JVl we heavily relied on the fact that we could find 
objects close to the query point. Knowing the disorder constant then allowed us to exclude other objects 
as near neighbors. If we do not know D, we can still hope that by building a hierarchical decomposition, 
dissimilar objects will be separated rapidly as we walk down from the root to a leaf. Hence, we would 
also expect objects similar to q to be close to it in the tree. However, we cannot bound this distance, as 
we cannot use D as an input to the algorithm. First, however, we will give an example of a simple and 
intuitive algorithm that shows how much we can gain by knowing the disorder constant. 

A. Consequences of knowing the disorder constant 

If we know that an object u is the j^^ nearest neighbor of an object x (i.e., we have rx{u,T) = j), 
and we are looking for an object y, such that ru{y,T) < (, then we know that y must lie in an annulus 
centered at x of a certain width around u i.e., we know that a < rx{y) < b, where a and b are functions 
of j and C- In particular, we have (proof in Appendix 10): 

Lemma 3. Consider three objects x, y, and u. Let r^(u) = j and ry[u) < 7 (1), or ru{y) < (" (2). 
Then, y must lie in an annulus such that — Q < rx{y) < D[j + Q). 

By sampling m objects u.a.r., and computing all ranks w.r.t. to these objects, we can thus narrow 
down the search space to an annulus of width depending on D and on the rank of the closest sample 
(proof in Appendix |F|. 

Theorem 5, Given a query object q ^ IC, we can retrieve one of its R nearest-neighbors in T by asking 
an expected m + log n + D -\- + 1 questions, with constant probability. The learning phase requires 
asking an expected 0{nm\ogn) questions to sort all objects w.r.t. the m samples. 

In particular, by setting m = ^/^, we can retrieve one of the R nearest neighbors with constant 
probability in expected O(D^y^) questions. The example is similar to what happens on a given level 
in the hierarchical scheme of Section HV-AI The fact that we know D, as is illustrated by the algorithm 
above, allows us to exclude some objects as being nearest neighbors. Indeed, if we have information 
about the rank of the sample w.r.t. the query point, or vice-versa, then we know the nearest neighbor 
must lie in an annulus of known width. On the other hand, if the disorder is unknown, we cannot exclude 
any object, whatever the density of the sampling. In the next section, we ask whether we can build a 
data structure that adapts to the characteristics of the space, without requiring it them as input. In other 
words, we ask whether we can decompose the space in such a way that dissimilar objects are likely to 
be separated, and similar objects remain close to each other, without knowing a characterization of the 
space. 

'unless we go sequentially through all objects and compare them to the current nearest neighbor, there could always be an 
object closer to the query point. 
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B. Binary Tree Decomposition 

A natural and simple way to build a data structure suited for search operations is to build a tree. 
By recursively applying Algorithm [3l we can decompose the database into a binary tree. Clearly, this 
algorithm does not require any characterization of the space as an input. As illustrated in Section IV-AI 
if we do not know D, we cannot limit the ranks if we only have partial rank information. However, 
we can expect this decomposition to adapt to the structure of the underlying space. Let the expected 



input : A set S of objects G T 
output: Two sets of objects and Si = S\Sq 

3.1 pick two objects xi and X2 u.a.r. in S; 

3.2 Sq = 0, 5i = 0; 

3.3 forall u G 5 do 

3.4 I if 0{xi,X2, u) = u then Sq = SqVJu else 5i = Si U u; 

3.5 end 

Algorithm 3: Rank-Ball Cut 

diameter after the decomposition of S into and 5o be defined as = ^Ag^j + ^i^A^^ < A5 (by 
analogy to the notion in |Ul). Observe that the diameter of a set (see definition in Section ^ has the 
following property (proof in the Appendix iGl ). 

Lemma 4. The diameter of a set S with \S\ = n is always less than or equal to 2n i.e., A5 < 2n, with 
equality when d{u,v) = d{v,u) in the hidden metric space (symmetric distance function). 

We can compute the expected diameter after the decomposition of S into Sq and ^i. First, observe that 
it will always decrease, as the cardinality of the two new sets must be smaller or equal to the diameter 
of S. Let us denote by xi and X2 the two randomly selected points in the set S. Let r^^ (^2) = k. 
By the approximate triangle inequality (1), for any pair of points u and v in 5o, we have ru{v) < 
D{rx^{u) + rx-i^{v)) < 2Dk. Hence, the diameter A^^ must be smaller or equal to 4Dk. We can then 
easily compute the expected diameter to be A5 < —k'^ + 2n — 2k. Further, the optimal value for k 
is k = jj^. However, by choosing X2 at random, we cannot ensure that rxj^{x2) takes a specific value. 
Nevertheless, we know that the value of k is uniformly distributed between 1 and n. Assume that we 
want As < e2n, for some e < 1. Then, we can prove the following theorem (proof in Appendix [6ll: 



Theorem 6. Let e < 1. Then, 



V 



As < e2n 



Let a "good cut" be a cut such that the diameter is reduced by epsilon. The probability that we reduce 
the diameter y a factor e degrades with increasing values of D. Hence, even though the disorder constant 
is not an input to the algorithm, the performance will depend on the disorder constant. For instance, if D 
were constant, then we would reduce the diameter by a constant with constant probability. In general, we 
roughly need '^"^^(^^ good cuts to divide the diameter by a constant c. In any case, the depth of the binary 
tree is O(logn) w.h.p. (proof in the Appendix H]). An interesting fact is that the probability that a node u 
falls in the good set i.e., the ball around xi is given by = P [n G Bxi{rxi{x2))] = ^ — ''^IT')- 
Hence, "outliers" are likely to be put in the same bin as other outliers, while similar objects are likely 
to be put in the same bin. For instance, an object y far away from all other objects, such that 'iu 
we have ru{y) = n, will hardly ever be put in the good set. Conversely, if there is a set of very popular 
nodes, which have a low rank w.r.t. all other objects, they will often end up in the good set. Consequently, 
this function can be used to estimate how "central", or popular an object is (analogous to the notion of 
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1-median in ||T3l). Yi = Y^- Ijnode i is in the good set}' where the sum goes over randomly selected 
hash functions and 1|} is the indicator function, will provide such an estimate. It also implies that outhers 
are more likely to be separated from other objects. In particular, if we computed k times the result of a 
randomly chosen hash function h, Yu would be roughly equal to k(j)u- By sorting the Kj's, we obtain a 
ranking of the objects by popularity. In the next subsection, we will try to exploit this property to design 
a hashing scheme. 

C. Rank-Sensitive Hashing 

We have developed the intuition that by randomly cutting out balls, it is more likely that similar objects 
will stay together, and dissimilar objects be separated. This should be sufficient, if we can amplify this 
property, to allow us to efficiently search for similar objects. Indeed, we will now show how we can use 
this technique to develop a rank sensitive hashing scheme. The rank distortion provides us a sufficient 
condition for the scheme to work. Our hash function h selects two objects u.a.r in T (say xi and X2), 
and assigns values h{u) E {0, 1} to all objects u as follows 

, , _ f \f 0{xi,X2,u) = u 

\ 1 if 0{xi,X2,u) = X2 

Note that computing h requires asking a single question per object, and that the algorithm does not 
require any characterization of the space as input. The function h is (r, (1 + e)r, 1 — 1 — ^^'^^^^^^ )- 
rank-sensitive. This is the result of Theorem |3j proved in Appendix |Jl A special case is when the function 
/ is linear. Then, we obtain the following result (proof in Appendix [Kb . 

Corollary 1. We can retrieve one of the {l + e)r-nearest neighbors in T of a query point q, with constant 
probability, by asking n^^i^ questions, where 7 is the rank distortion ofT, when the rank distortion 
function is linear. 

Intuitively, one situation where / is roughly constant is when the underlying space is close to a line 
in SR"'. Further, our numerics have shown that even for higher dimensions, when the underlying space 
is homogeneous {e.g., points distributed u.a.r. in a unit box with wrap around distances), the function 
/ is very steep for small values of r„(u) and then almost linear. An example is given in Figure |4] in 
Appendix |Ll 

VI. Conclusions 

We addressed the problem of finding an object similar to a query object among the objects in 
a large database. In contrast to most existing formulations, we asked whether the database can be 
searched efficiently if its distance information can only be accessed through a similarity oracle, and the 
underlying objects need not be in a metric space. The oracle is motivated by a human user who can 
make comparisons between objects but not assign meaningful numerical values to similarities between 
objects. This raises new interesting questions on what are good properties of the rank relationships, what 
are good and efficient algorithms and what is the right characterization of such a space. We worked with 
two such characterizations in this paper. One that captures the transitivity of the rank relationship through 
disorder constant {D), and the other one, rank distortion (7), which captures how rank r„(f ) relates to 
lEj [ri{v) — ri{u)\. We presented a new randomized algorithm that improves the performance of existing 
algorithms for the combinatorial framework, and proved a lower bound on the search complexity. We 
also propose a new characterization of the hidden space, rank-distortion, and show that the performance 
of a novel rank-sensitive hashing scheme depends on that property. Rank-sensitive hashing enables 
(approximate) nearest neighbor search in a manner similar to locality sensitive hashing. We believe that 
ideas of searching through comparisons form a bridge between many well known search techniques in 
metric spaces to perceptually important (non-metric spaces) situations. 
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Appendix 

A. Proof of Theorem |?] 

We first prove two technical lemmas that we will need to prove Theorem |4l 

Lemma 5. If we throw m = ab log n balls into h bins, each chosen uniformly at random, then the first 
bin will contain at least one ball with probability 1 — ^ 

Proof: The probability that a bin contains no ball is 

V [a bin contains no ball] = (i - i)°^''i°g" 

< g-alogn 

■ 

Lemma 6. We throw m balls into n bins, each chosen uniformly at random. We number the bins from 1 
to n. Then, the probability that the bins 1 to ^ contain more than (1 + T)m/c or less than (1 — T)m/c 
balls is at most 2e~'^^™/^'^. 

Proof: We throw the balls one after the other into the bins. Let Xi = 1 ii the i*^ ball falls in 
one of the - first bins, and else. Let X = X^jXj. Clearly, we have E [X] = m/c, as V [Xi = 1] = 
1/c and all Xj's are independent. By the Chernoff Bound (see for instance |[T4l . page 67), we have 
P [|X - E [X]\ > r?n/c] < 2e-^'™/3c_ g 
We are now ready to prove Theorem |4] 

Proof: Let rrij = a(2D)* logn denote the number of objects we sample at level i, and let Si be the 
set of samples at level i i.e., \Si\ = mj. Here, a is an appropriately chosen constant, independent of D 
and 71. Further, let Aj = -pT^jr^- We will first show the for every object o G T U {q}, where q is the 
query point, the following four properties of the data structure are true w.h.p. 

1) \SinBo{\i+i)\ > 1 

2) \SinBoiXi)\ < 4aL>logn 
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3) \Si+inBoiXi.i)\ < WaDHogn 

4) \Si n Bo{4:Xi)\ > AaDlogn 

5) |5'i+i ni3o(4Ai_i)| < 64aL»3logn 

Fix an object o and a level i. To visualize the proof, place all objects in the database on a line, such that 
the object u with rank ro{u,T) = r is located at distance r from o (see figure [Hi. Property [U tells us that 

1 2 3 4 ... ranks with respect to o ^ ... n 

Level 

Samples that could be the closest one to one of the samples with rank <K at level I 

Level i-1 -A. _.)--(. _>--(._.) D--\J \.J — -V 



Level i 




Samples that could be associated to a samples with rank <K ^ at level i-1 



Samples that could be the closest one to the nn 



Lowest I 
level L ■ ' I T=A^ 

Nearest neighbor (nn) 

O : samples 



Fig. 1. We place all objects on the line such that the object u with rank ro{u, T) = r is located at distance r from o. 

at least one of the samples at level i will be such that its rank w.r.t. o is smaller than Aj+i i.e., 3s G Si 
s.t. ro(s) < Aj+i- Clearly, by Lemma [51 this is true with probability at least 1 — (set m = rrii and 
b = {2Dy = in the lemma). Property [2] tells us that not too many objects can have rank less than 
Aj at level i w.r.t. o. Let c = j- = (2D)*~^. Now, by lemma [6] (set m = rrii = a{2Dy logn and r = 1), 
the probability that more than 2a(2D)* log n/(2Z))'"^ = AaDlogn samples are among the Aj = ^ 
closest samples to o is less than 2e~^'^^^°^"/'^ = ^^(a) ■ The proof of Property |3] is identical, except that 
we replace Aj by Aj_i. Then, we have c = (2D)*~^, Hl!^±i = 16aD'^logn, and the probability that 
\Si^i n Bo{Xi-i)\ > 16aD^log?i is smaller than ^^(a) , as before. For Property IH we expect SaDlogn 
objects to be sampled at level i among the 4Aj closest objects to o. Again, by lemma [6l the probability 
that less than half that many objects get sampled is at most . Finally, the proof of Property [5] is 
almost identical to the proof of Property |2] By choosing a large enough, we can make sure that the five 
properties are true for all objects and all levels w.h.p. (take the union bound over the n objects and the 
L = ^ levels). 

From now on, we assume that we are in the situation where Properties (1) to (5) are true for all objects 
(which is the case w.h.p.). Again, fix an object o. Consider a sample s G such that ro{s) < Aj+i 
(note that Property [T] guarantees that there is a least one such sample). Further, let s' G Si be the sample 
at level i closest to s i.e., s' = min^^g^. rs{s'). Again, by Property [T] we know that rs(s') < Aj+i. Hence, 
by the approximate triangle inequality 3 (see Section [Till, we have: 



d(s, T) < A,+i and r,{s'),T) < A^+i ^ ro{s'),T) < 2DXi+i 



= Xi 



Consequently, we know that the sample that is closest to o at level i + 1 will be in the bin of a sample 
s' G Si that has rank ro{s',T) < Aj. The algorithm associates every object o to the closest sample on 
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each level. Hence, to find that sample for object o at level i + 1, it would be sufficient to rank (w.r.t. o) 
all sample in 5^+1 that fell in the bin of a sample at level i that has rank less than Aj. Property |2] tells us 
that \Si ni3o(Aj)| < AaDlogn. Hence, by inspecting the bins of the at most AaDlogn closest samples 
to o at level i, and ranking the samples at level i + 1 that fall in those bins, we are guaranteed to find 
the closest sample. Property |4] tells us that all of the AaDlogn closest samples to o at level i have rank 
less than SAj. Consider a sample s G 5j such that ro{s,T) < 8Aj and a sample s" G Sj+i that falls in 
the bin of s. By property [H we must have rs"{s,T) < Xi + 1. Thus, by inequality 2, we have: 

rs"{s,T) < Ai+i and ro{s,T) < 8Xi ^ ro{s",T) < D{8\ + Xi+i) < 4Ai_i 

By property [51 there are at most 0{D^logn) such samples at level i + 1. 

To summarize, at every level in the hierarchy, and for every object, we need to rank at most 0{D^ log n) 
samples. Consequently, we need to ask at most 0(nL'^log^nloglog?i^ ) questions in total to rank at 
most 0(D'^log(n)) objects for every object and level. The algorithm only fails with negligible (smaller 
than ^) probability if an object has no sample that falls within Aj at any level i. ■ 

B. Proof of Theorem [7] 

Proof: The upper bound on the number of questions to be asked in the learning phase is immediate 
from Theorem |4l For each object, we need to store one identifier (the identifier of the closest object) at 
every level i in the hierarchy, and one bit to mark it as a member of Si or not. Hence, the total memory 
requirement^ do not exceed 0(n log^ n/ log (2D) bits. Finally, the properties 1-5 shown in the proof of 
Theorem |4] in Appendix lAl are also true for an external query object q. Hence, to find the closest object 
to q on every level, we need to ask at most 0{D^ log^ n log log ) questions. In particular, the closest 
object at level L = log2£,(n) will be q's, nearest neighbor w.h.p. ■ 

C. Proof of Lemma^ 

Proof: Consider the configuration given in Figure |2] We need to show that for all triples x,y,z, 
where x, y, z € T U {q}, we have rx{y) < D{rz{x) + rz{y)). First, let us consider two nodes x and y 
such that d{x,y) = d, with d > 1. Clearly, these two nodes must be in two different supernodes as the 
maximum distance inside a supernode is is strictly smaller than 1. Further, we have |;Bx(d)| < 4a^(i. 
Indeed, even if x is in the supernode at the center of the star, there are at most ad other supernodes within 
distance d. Each supernode can contain at most a nodes. Further, the query point could be within distance 
d of X, in that case there could be at most Ida^ additional objects in the balls. On the other hand, we have 
15^(^/2)1 > da/2. Indeed, even if z is placed at the end of a branch, there are at least da supernodes 
within distance d, each containing a nodes. Hence, we have rx{y) < Aa'^d < 2Dad < D{rz{x) + rz{y)) 
by setting a = D/2. We have used the fact that r^{x) + r^{y) > \Bz{j)\ + \Bz{d - j)\ > 2\Bz{d/2)\. 

If the distance is smaller than 1, then x and y must be inside the same supernode. In that case, we 
have rx{y) < a < 2D < D{rz{x) + rz{y)). We can prove the other inequalities in a similar way. ■ 

D. Proof of Theorem |2] 

Proof: Consider the graph metric with shortest path distance in Figure |2] Yao's minimax principle 
(see 113) states that, for any distribution on the inputs the expected cost for the best deterministic 
algorithm provides a lower bound on the expected running time of any randomized algorithm. The 
graph (solid lines in Figure |2l) is known. It consists of a star with a branches, each composed of ^ 
supernodes. Each of the supernodes in turn contains a database objects {i.e., objects in T). Clearly, in 
total there are aa^ = n objects. We know the answers to all questions of the type 0{a,h,c), where 

^Making the assumption that every object can be uniquely identified with log n bits 
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w=1+e 



Fig. 2. A graph with disorder constant a and shortest path distance. The graph forms a star with a branches. Each branch 
is composed of n/o? "supernodes". Each edge between the "roots" of the supemodes in the star has weight 1 (w denotes the 
weight in the figure). Each supernode (see zoomed region on the right side of the figure) is in turn composed of a "root", and a 
smaller (a)-ary tree (of depth 1) consisting of a actual database objects. The weights of the edges in the tree range from l/Aa 
to a/Aa. Finally, a query point is randomly connected to one non-root node on each branch of the star with edges of weights 
ranging from 1 to 1 + e (dashed lines), where e ^ l/4a. The distances on the graph are shortest path distances. There are 
(^)" ways to connect the query point to the database. Further, for each such configuration, there are a possible choices for 
the nearest neighbor (the direct neighbor which is connected to the query point with the edge of smallest weight). We assign 
weights to the edges connecting direct neighbors to the query point in such a way that each of the direct neighbors is equally 
likely to be the nearest neighbor, and each weight is different. 

a, 6, c G T. We attach a query point q to that graph, and we assume that each "position" of the query 
point (as shown in Fig. |2l) is equally likely. That is, the query point is attached to one (non-root) object 
chosen u.a.r. on each branch of the star with an edge. This object is a called a direct neighbor. The 
weights of the corresponding edges are chosen between 1 and 1 + e in a random way as well (such 
that we do not have ties, and each of the direct neighbors is equally likely to be the nearest neighbor). 
In other words, the input distribution is uniform over all configurations. First, note that g's nearest 
neighbor must be one of the objects connected directly to it i.e., one of the a direct neighbors. Indeed, 
let 5 = {it e T\u is a direct neighbor of q}. Then, we have d{u, q) < d{v, q), when it G 5 and v G T\6. 
Further, any of these direct neighbors could be g's nearest neighbor with equal probability. Assume that 
we are given for free the answers to all questions, except the questions of the type 0{q,x,y), where 
both x,y ^ 5. This amounts to knowing which are the direct neighbors, but not knowing anything about 
the ranking of those direct neighbors with respect to q. Indeed, by construction, all direct neighbors are 
closer to the query point than any other object in the database. Hence, if we used the oracle to compare a 
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direct neighbor with another object (which is not a direct neighbor), the oracle would always answer that 
the direct neighbor is closer to the query point. So, we could not exclude one of the direct neighbors as 
the nearest neighbor (we do not learn anything about the nearest neighbor). Hence, in order to identify 
the nearest neighbor, the best deterministic algorithm must at least ask a questions to find the nearest 
neighbor among the direct neighbors (we must traverse the list of direct neighbors and ask the oracle 
to compare every object with the current best candidate). Consequently, we must first identify all direct 
neighbors, and then compare them with each other. 

Note that there are (^)" ways to choose the direct neighbors, and that each configuration is equally 
likely. Identifying all direct neighbors is equivalent to knowing which of these configurations we are 
in. Let X denote the random variable of which each outcome corresponds to a configuration. Then, the 
entropy of X is log(^)" = alog(n/a^) + Q;loga bits. The answer to every question we ask the oracle 
will reduce the uncertainty about which configuration we are in. In order for the probability of error pe 
to be equal to zero i.e., in order to be sure that we found the all direct neighbors, Fano's Inequality (see 
|[T6l . p39) tells us that we must know at least a set of answers A such that H(X\A) = 1 bit to have the 

Pe>0. 

For every branch of the star, choosing a direct neighbor u.a.r. is equivalent to choosing a supernode 
u.a.r., and then a direct neighbor inside that supernode u.a.r. First, assume that we know, on each branch, 
in which supernode the direct neighbor is located. Let us focus on one branch, and the supernode on this 
branch containing a direct neighbor. Denote that supernode by In that case, in order to identify the 
direct neighbor in we must ask questions of the type 0{q,a,b), where a,6 S Asking a question 
where either a, b or both are outside <^> does not tell us anything about which object is the direct neighbor, 
as all objects inside are closer to q than any object outside that supernode. Further, note that the answer 
to any question of the type 0{q,a,b), where a, 6 € <I> and d{z,b) > d{z,a) is b only if b is q's direct 
neighbor in <I>. Hence, the answer to a question of this type allows us to exclude only one object at a 
time0. Hence, for each of the a supemodes that contain a direct neighbor to q, we must ask an expected 
il(a) questions to identify the direct neighbor. Knowing all the direct neighbors, when the supernodes 
in which they are located are known, reduces the entropy by alog(a) bits. Indeed, there are a such 
supernodes, and a choices for the direct neighbor inside each of these supernodes {i.e., if we fix the 
supernodes containing the direct neighbors, there are a" ways to choose the direct neighbors). As every 
question only excludes one object inside a supernode as direct neighbor, in total we must ask r2(a^) 
questions to the oracle. 

Let us now remove the assumption that we know which supernodes contain a direct neighbor. There 
are (^)" ways to choose the supernodes that contain the direct neighbors. The entropy for this random 
choice is consequently alog(n/a^) bits. Thus, at best, we need to ask alog(n/a^) questions (in the 
best case each question reduces the number of possible configurations by 2) in order to know in which 
supernodes the direct neighbors are located. In total, we consequently need to ask at least an expected 
il{a log ^ + a^) questions, to reduce the entropy by log(^)" = a log(n/a^) + a log a bits and having 
Pe > 0. By letting a = @{D), we obtain the claim. ■ 

E. Proof of Lemma \3\ 

Proof: The result follows directly from the approximate triangle inequality (see Definition |2l). The 
lower bound follows from inequality 3 for (1) and inequality 2 for (2). The upper bound follows from 
inequality 2 for (1) and inequality 3 for (2). ■ 

^The same is true if we ask questions wliere a and b are in different supernodes. What matters is that we can only exclude 
one object as being a direct neighbor every time we ask a question 
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F. Proof of Theorem \5\ 

Proof: During a learning phase we sample m objects S = {si,...Sm} u.a.r in T, and rank all 
other objects with respect to the objects in Si.e., Vs G S,u ^ T, we compute rs{u,T) by querying 
the oracle (this can be done by asking 0(mn log n) questions). In the search phase, we start by finding 
the point in S closest to q, that is we want to find x = argmins£Sfq{s)- This can be done in m 
steps by traversing the list of objects in S sequentially and storing the closest element seen so far. In 
particular, for every object in 5, we ask the oracle whether it is closer to q than the current minimum, 
and if so it becomes the new minimum. Then, using binary search, we can find j' = r^(g) {i.e., we 
ask the oracle whether q is closer to x than the element y such that rx{y) = n/2, and then apply 
this process recursively on the new "interval")- Now, given that in the learning phase we sample m 
objects u.a.r. in T, we know that V [rq{x) = j] = ^(1 — ^)™. Further, we know by triangle inequality 
that < j' < Dj. Hence, by Lemma |3l all objects o such that rq{o,T) < R must lie in an annulus 
centered at x such that — R < rx{o) < D'^j + DR (see Figure [3]). This annulus contains at most 



1X 



o^:r^(c4)=D(j'+R) j 



o ■ r (o J=j'/D^R 



Fig. 3. The R nearest neighbors of q must he in an annulus around x. 

{D + l)R+j{D'^ — ^) < {D+l)R+j{D'^) objects, of which R are the R nearest neighbors of q. Hence, 
by sampling (1^+1) + -'^'^ ^ times, we will retrieve an i?-nearest neighbor with constant probability. Thus, 
the expected number of times we need to sample is Yl]=i ^(1 - + 1) + ^^) <iD + l) + 

For every sample, we ask the oracle if it is closer to q than the currently closest sampled point. If so, 
we store this point, else we delete it. ■ 

G. Proof of Lemma |4] 

Proof: Clearly, if |5| = n, for any pair of point u and v, we have ru{v,S) < n. Hence, ru{v) + 
rv{u) < 2n for all u,v. In case the distances are symmetric in the hidden space, we can rank the 
distances from the smallest distance to the largest distance. Consider the pair v, w, such that d{v, w) = 
d{w,v) > d{i,j), for all Then, we clearly have r^(w) = n and rw{v) = n, since there cannot be 
any point further away from v than w, and vice-versa. ■ 
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H. Proof of Lemma |6| 

Proof: We need to compute the probability that k is such that ^k"^ + 2n — 2k < e2n, or equivalently 

^k^-k+{l-e)n < 0. Solving for k, we obtain k = ^^V^-^i^-^\ ^ ^±^^1 _ 8Z)(1 - e). Hence, 

the number of values of k for which the above condition is fulfilled is | ^ + ^ y^l — 8L'(1 — e) — 
+ ^ Y^l — 8L'(1 — e)| = ^ Y^l — 8D{1 — e). As we choose k u.a.r. from 7i values, we have 
_i.^l_8D(l-e) ■ 



/. Depth of binary tree 

Proof: Let 5 < 0.5 be a constant independent of n, D. Consider a particular path in the binary tree 
from the root to a leaf. Let rii denote the number if objects in the set at level i and ki the rank of X2 
w.r.t xi {i.e., (2^2)) chosen at level i. Let Xj = 1 if 5ni < h < {I — 6)ni. As ki is distributed u.a.r. in 
1, .., rii, we have V [Xi = 1] = 1 — 25. If Xi = 1, the number of objects is reduced by a factor at least 
1 — 5 at this level i.e., max{|5o|, \Si\} < (1 — 6)ni. As there are n objects in total, we can not reduce 
the number of objects by a factor (1 — 5) more than s = — times. In m levels on the path, the 



some constant a > 1 we have 



log{l/(l-<5)) 

expected number of times we expect Xi to be equal to 1 is /z = (1 — 26)m. If we set m = (^1^26) ' 



V 



ET=i ^1 < /^/2 = < 0{l/poly{n)) 



By the Chernoff bound. There are at most n paths (ones per leaf). Taking the union bound over these 
paths, we obtain the claim. ■ 



/. Proof of theorem \3\ 

Proof: First, we compute the probability that the hash function h is different for two objects u and 

^' P =V[h{u)^h{q)] 

= EjjeT^ [^(^) ^ H9)\^i = = j]V [xi = i,X2 = j] 
= ^ Ei (X2) G [rx^ {q),rx, {u)] \xi = i] 

= ^Ei - ri{q)\ 



Hence, we have 



= :^\\Pu-pq\\l 
V [h{u) = h{q)\rq{u) < r] = 1 - \\\pu -Pq\\i>l-^ 



, and similarly 



V [h{v) = h{q)\r,{v) >{l+e)] = l- \\\p, - P,||i < 1 - l^^^±^\ 



K. Proof of Corollary [7] 

Proof: The proof is analogous to the proof for locality-sensitive hashing for binary vectors provided 
in More precisely, for an (r, R,p, P)-rank sensitive hashing scheme, retrieving one of the R nearest 

neighbor of a query point q will requires 0{n^) evaluations of the hash function. 9 is defined as y^Jr- 

It can be shown that 9 < j+j ^ = 0{j^). Indeed, the probabilities p and P take the same values as if 

we hashed binary vectors of dimension n^c, and let r' = r, and (1 + e')r' = (1 + e)r/7. Then, ^ < p- ■ 
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L. Numerical example for rank distortion 



--3E--3> 




1600 



Fig. 4. The hidden space consists of 1600 points distributed u.a.r on [0, 1] , where d — 1,2,4. To avoid border effects, we 
compute distances with wrap-around. We plot the |[pu — against r-„(w), for a fixed u. The results are averaged over 100 
samples and the error bars correspond to the standard deviation. Note that the slope is first steep and then linear. Such a function 
is appropriate for RSH, as the function / increases monotonically. Further, the fact that we have a steep slope for small values 
of R make those spaces particularly attractive. Indeed, this implies that P decreases rapidly (so we can search for J?-nearest 
neighbors, even for small R), and p is sufficiently large for small values of r. This example shows that for homogeneous spaces, 
the rank distortion function is such that we can perform RSH efficiently. 



